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ABSTRACT 

Motivation: The comparison of diverse genomic datasets is 
fundamental to understanding genome biology. Researchers must 
explore many large datasets of genome intervals (e.g., genes, 
sequence alignments) to place their experimental results in a 
broader context and to make new discoveries. Relationships between 
genomic datasets are typically measured by identifying intervals 
that intersect: that is, they overlap and thus share a common 
genome interval. Given the continued advances in DNA sequencing 
technologies, efficient methods for measuring statistically significant 
relationships between many sets of genomic features is crucial for 
future discovery. 

Results: We introduce the Binary Interval Search (BITS) algorithm, 
a novel and scalable approach to interval set intersection. We 
demonstrate that BITS outperforms existing methods at counting 
interval intersections. Moreover, we show that BITS is intrinsically 
suited to parallel computing architectures such as Graphics 
Processing Units (GPUs) by illustrating its utility for efficient 
Monte-Carlo simulations measuring the significance of relationships 
between sets of genomic intervals. 
Availability: https://github.com/arq5x/bits 
Contact: arq5x@virginia.edu 



1 INTRODUCTION 

Searching for intersecting intervals in multiple sets of genomic 
features is crucial to nearly all genomic analyses. For example, 
interval intersection is used to compare ChIP enrichment between 
experiments and cell types, identify potential regulatory targets, 
and compare genetic variation among many individuals. Interval 
intersection is the fundamental operation in a broader class 
of "genome arithmetic" techniques, and as such, it underlies 
the funct ionality found in many genome analysis software 
packages llKent et all I2002I : iGiardine et all |2005| ; iLi et all I2009I : 
lOuinlan and HallLl201 lh . ~ 

As high throughput sequencing technologies have become the 
de facto molecular tool for genome biology, there is an acute 
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need for efficient approaches to interval intersection. Microarray 
techniques for measuring gene expression and chromatin states 
have been largely supplanted by sequencing-based techniques, and 
whole-exome and whole-genome experiments are now routine. 
Consequently, most genomics labs now conduct analyses including 
datasets with billions of genome intervals. Experiments of this 
size require substantial computation time per pair-wise comparison; 
moreover, typical analyses require comparisons to many large sets 
of genomic features. Existing approaches scale poorly and are 
already reaching their performance limits. We therefore argue the 
need for new scalable algorithms to allow discovery to keep pace 
with the scale and complexity of modern datasets. 

In this manuscript, we introduce the Binary Interval Search 
(BITS) algorithm as a novel and scalable solution to the fundamental 
problem of counting the number of intersections between two sets of 
genomic intervals. BITS uses two binary searches (one each for start 
and end coordinates) to identify intersecting intervals. As such, our 
algorithm executes in O(iVlogiV) time, where N is the number 
of intervals, which can be shown to be optimal for the interval 
intersection counting problem by a straight-for ward reduction to 
eleme nt uniqueness (known to be Q(N log N) jMirsa and GriesL 
1982)). We illustrate that a sequential version of BITS outperforms 
existing approaches. We also demonstrate that BITS is intrinsically 
suited to parallel architectures. The parallel version performs the 
same amount of work as the sequential version (i.e., there is no 
overhead) which means the algorithm is work-efficient, and because 
each parallel thread performs equivalent work, BITS has little thread 
divergence. While thread divergence degrades performance on any 
architecture (finished threads must wait for over-burdened threads 
to complete), the impact is particularity acute for GPUs where 
threads share a program counter and any divergent instruction must 
be executed on every thread. 

1.1 The Interval Set Intersection problem 

We begin by reviewing some basic definitions. A genomic interval 
is a continuous stretch of a genome with a chromosomal start and 
end location (e.g., a gene), and a genomic interval set is a collection 
of genomic intervals (e.g., all known genes). Two intervals a and 
b intersect when (a. start < b.end) and (a.end > b. start). The 
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intersection of two interval sets A — {ai, 02, . . . , ffliv} and B — 
{61, &2, • ■ • , &m} is the set of interval pairs: 

I(A,B) = {(a,b)\a£ A,beB, 

a. start < b.end A a. end > b. start} 

Intervals within a set can intersect, but self-intersections are not 
included in T(A,B). There are four natural sub-problems for 
interval set intersection: 1) decision - does there exist at least 
one interval in A that intersects an interval in Bl; 2) counting 
- how many total intersections exist between sets A and Bl; 3) 
per-interval counting - how many intervals in B intersect each 
interval in Al; 4) enumeration - what is the set of each pair- wise 
interval intersections between A an Bl While BITS solves all four 
sub-problems, it is designed to efficiently count the number of 
intersections between two sets, and as such, it excels at the decision, 
counting, andper-interval counting problems. 

1.2 Limits to parallelization 

Interval intersection has many applications in genomics, and as 
such, several algorithms have been developed that, in general , 
are e ither based on trees iKent et aZl 1 20021 : lAlekseve nko and LeeL 
l2007h . or linear sweeps of pre- sorted intervals TRknM^sonn2*00^V 

The UCSC genome browser introduced a widely-used scheme 
based on R-trees. This approach partitions intervals from one 
dataset into hierarchical "bins". Intervals from a second dataset are 
then compared to matching bins (not the entire dataset) to narrow 
the search for intersections to a focused portion of the genome. 
While this approach is used by t he UCSC Genome Browser, 
BED Tools touinlan and Hall 1201 lh . and SAMTOOLS jLi et all 
l2009h . the algorithm is inefficient for counting intersections since 
all intervals in each candidate bin must be enumerated in order to 
count the intersections. Since the number of intersections is at most 
quadratic, any enumeration-based algorithm is 0(N 2 ). 

Moreover, these existing approaches are poor candidates for 
parallelization. Thread divergence can be a significant problem 
for hierarchical binning methods. If intervals are not uniformly 
distributed (e.g., exome sequencing or RNA-seq), then a small 
number of bins will contain many intervals while most other bins 
are empty. Consequently, threads searching full bins will take 
substantially longer than threads searching empty bins. In contrast, 
BITS counts intersections directly without enumerating intersecting 
intervals and therefore, the underlying interval distribution does not 
impact the relative workload of each thread. 

Recent versions of BEDTools and BEDOPS jNeph et a/.Ll2012h 
conduct a linear "sweep" through pre-sorted datasets while 
maintaining an auxiliary data structure to track intersections as 
they are encountered. While the complexity of such sequential 
sweep algorithms is theoretically optimal, the amount of parallelism 
that exists is limited, and some overhead is required to guarantee 
correctness . Any linear sweep algori t hm mu st maintain the "sweep 
invariant" ( TMcKennev and McGuireL l2009h . which states that all 
segment starts, ends, and intersections behind the sweep must 
be known. A parallel sweep algorithm must either partition the 
input space such that each section can be swept in parallel without 
violating the invariant, or threads must communicate about intervals 
that span partitions. In the first case parallelism is limited to the 
number of partitions that can be created, and threads can diverge 
when the number of intervals in each partition are unbalanced. In the 



second case, the communication overhead between threads prevents 
work efficiency and can have significant performance implications. 
In BITS, the amount of parallelism depends only on the number of 
intervals and not the distribution of intervals within the input space, 
and there is no communication between threads. 



2 METHODS 

A seemingly facile method for finding the intersection of A and B 
would be to treat one set, A, as a "query" set, and the other, B, as 
a "database". If all of the intervals in the database were sorted by 
their starting coordinates, it would seem that binary searches could 
be used for each query to identify all intersecting database intervals. 

However, this apparently straight-forward searching algorithm is 
complicated by a subtle, yet vexing detail. If the intervals in B are 
sorted by their starting positions, then a binary search of B for the 
query interval end position ai.end will return the interval bj 6 B, 
where bj is the last interval in B that starts before interval at ends 
(e.g, interval e in Fig. 1 A). This would seem to imply that if bj does 
not intersect at, then no intervals in B intersect at, and if bj does 
intersect ai, then other intersecting intervals in B could be found 
by scanning the intervals starting before bj in decreasing order, 
stopping at the first interval that does not intersect a*. However, 
this technique is complicated by the possibility of intervals that are 
wholly contained inside other intervals (e.g., interval c in Fig. IB). 

An interval bj £ B is "contained" if there exists an interval 
fefe 6 B where bu .start < bj. start and bj.end < bk-end. 
Considering such intervals, if the interval found in the previous 
binary search bj does not intersect the query interval a*, we cannot 
conclude that no interval in B intersects at, because there may exist 
an interval bj- x £ B where bj- x .end > ai. start. Furthermore, if 
bj does intersect ai, then the subsequent scan for other intersecting 
intervals cannot stop at the first interval that does not intersect 
ai\ it is possible that some earlier containing interval intersects 
Oi. Therefore, the scan is forced to continue until it reaches the 
beginning of the list. As contained intervals are typical in genomic 
datasets, a naive binary search solution is inviable. 

2.1 Binary Interval Search (BITS) Algorithm 

We now introduce our new Binary Interval Search (BITS) algorithm 
for solving the interval set intersection problem. This algorithm uses 
binary searches to identify interval intersections while avoiding the 
complexities caused by contained intervals. The key observation 
underlying BITS is that the size of the intersection between two 
sets can be determined without enumerating each intersection. For 
each interval in the query set, two binary searches are performed to 
determine the number of intervals in the database that intersect the 
query interval. Each pair of searches is independent of all others, 
and thus all searches can be performed in parallel. 

Existing methods define the intersection set based on inclusion: 
that is, the set of intervals in the interval database B that end after the 
query interval ai begins, and which begin before ai ends. However, 
we have seen that contained intervals make it difficult to find this set 
directly with a single binary search. 

Our algorithm uses a different, but equivalent, definition of 
interval intersection based on exclusion: that is, by identifying the 
set of intervals in B that cannot intersect ai, we can infer how many 
intervals must intersect ai. Formally, we define the set of intervals 
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Algorithm 1: Single interval intersection counter 
Input: Sorted interval starts and ends Bg and Be, query interval a 
Output: Number of intervals c intersecting a 

Function ICount(£?s, B e ,cl) begin 
first <— BSearch(_B s , a.end) 
last <— BSearch(S b , a. start) 

c «— first — last /* = \B\ — (last + (\B\ — first)) */ 
return c 



Query does not 

P intersect interval c. 
Thus falsely conclude that 
query does not intersect 
a b C d e f g intervals a or b. 
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Step 1: Binary search of query.start 
into database sorted by 
end coordinates. This excludes 
database interval c. 



Step 2: Binary search of query.end 

into database sorted by 
start coordinates. This excludes 
database intervals e,fg. 



Step 3: Infer the count of intervals 
that intersect the query by substracting 
the size of the excluded 
set from the size of the database 



Fig. 1. Comparing a naive binary search for interval intersection to the BITS 
approach. A. Binary searches of intervals sorted by start coordinate will 
occasionally identify overlapping intervals. However, contained intervals 
prevent knowing how far one must scan the database to identify all 
intersections. B. Contained intervals also cause single binary searches to 
falsely conclude that no intersections exist for a given query interval. C. To 
overcome these limitations, BITS uses two binary searches of the database 
intervals: one into a sorted list of end coordinates and the other into a sorted 
list of start coordinates. Each search excludes database intervals that cannot 
intersect the query, leaving solely the intervals that must intersect the query. 



X(B,a,i) £ B that intersect query interval m £ A to be the intervals 
in B that are neither in the set of intervals ending before ( "left of", 
set £ below) ch begins, nor in the set of intervals starting after ("right 
of", set 1Z below) a% ends. That is: 

£(B,cii) ={b G B\b.end < ai. start} 

7Z(B,a,i) ={b £ B\b. start > ai.end} 

X(B, ai ) =B \ (£(B, Oi) U 1Z(B, ai )) 

Finding the intervals in X(ai,B) for each ai £ A by taking the 
difference of B and the union of £(B,ai) and lZ(B,ai) is not 
efficient. However, we can quickly find the size of £(B,ai) and 
the size lZ(B,ai), and then infer the size of X(B,ai). With the 
size of X(B, ai), we can directly answer the decision problem, the 
counting problem, and the per-interval counting problems. The size 
of X(B, ai) also serves as the termination condition for enumerating 
intersections that was missing in the naive binary search solution. 



The BITS algorithm is based upon one fundamental function, 
ICount(Bs, B e , ai) = \X(B,ai)\ (Algorithm [TJ, which 
determines the number of intervals in the database B that intersect 
query interval ai. As shown in Fig. 1C, the database B is split 
into two integer lists Bs — start, b2 start, ... , &m .start] 
and Be = [bi.end, b^.end, . . . , bM-end], which are each sorted 
numerically in ascending order. Next, two binary searches are 
performed, last — BSearch(Be, at. start) and first — 
B Searches, ai.end). Since Be is a sorted list of each interval 
end coordinate in B, the elements with indices less than or equal to 
last in Be correspond to the set of intervals in B that end before 
ai starts (i.e., to the "left" of ai). Similarly, the elements with 
indices greater than or equal to first in Bs correspond to the set of 
intervals in B that start after ai ends (i.e., to the "right" of ai). From 
these two values, we can directly infer the size of the intersection set 
X(B, ai) (i.e., the count of intersections in B for ai): 

\B\- first =\n{B, ai )\ 

last =\£(B,ai)\ 

\B\ - (last + (\B\ - first)) =\X(B, 0i )| 

Using the subroutine ICount(Bs, Be, ai), all four interval set 
intersection problem variants can be solved. Pseudocode for the 
decision, per-interval counting, and enumeration sub-problems can 
be found in the Supplemental Materials. 

The BITS solution to the counting problem. Since BITS operates 
on arrays of generic intervals ((start, end)), and input files 
are typically chromosomal intervals ((chrom, start, end)), the 
intervals in each dataset are first projected down to a one- 
dimensional generic interval. This is a straight forward process 
that adds an offset associated with the size of each chromosome 
to the start and end of each interval. Once the inputs are projected 
to interval arrays A and B, the COUNTER (Algorithm O sets the 
accumulator variable c to zero; then for each at G A, accumulates 
c = c + ICount(Bs, BE,ai). The total count c is returned. 

2.2 Time Complexity Analysis 

The time complexity of BITS is 0((\A\ + log which 
can be shown to be optimal by a straight-for ward reduction to 
eleme nt uniqueness (known to be 6(N log N) dMirsa and GriesL 
|l982l) ). To compute ICount(Bs, Be, ai) for each a t in A, the 
interval set B is first split into two sorted integer lists Bs and 
Be, which requires 0(|B| log \ B\) time. Next, each instance of 
ICount(£?s, Be, ai) searches both Bs and Be, which consumes 
0(| A log \B\) time. For the counting problems, combining the 
results of all ICount(£?s, Be, ai) instances into a final result can 
be accomplished in 0(|A|) time. 
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Algorithm 2: Interval intersection counter 
Input: Database interval array B and query interval array A 
Output: Number of intersections c between A and B 

Function Counter(A, B) begin 

Bs [bl .start, . . . , .start] 

Be +- [61. end, . . . , 6| B | .end] 

Sort(B s ) 

Sort(Be) 

c «- 

for i +- 1 to |A| do 

[_ c <- c + ICount(_B s ,_B e , 

return c 



2.3 Parallel BITS 

Performing a single operation independently on many different 
inputs is a classic parallelization scenario. When based on the 
subroutine ICount(Bs, Be, a), which is independent of all 
ICoUNT(i?s, Be, x) for intervals x in the query set where a 7^ 
x, counting interval intersections is a pleasingly parallelizable 
problem that easily maps to a number of parallel architectures. 

NVIDIA's CUDA is a single instruction multiple data (SIMD) 
architecture that provides a general interface to a large number 
of parallel GPUs. The GPU is organized into multiple SIMD 
processing units, and the processors within a unit operate in lock- 
step. The BITS algorithm is especially well suited for the this 
architecture for a number of reasons. First, CUDA is optimized 
to handle large numbers of threads. By assigning each thread 
one instance of ICount(Bs, Be, a), the number of threads will 
be proportional to the input size. CUDA threads also execute in 
lock-step and any divergence between threads will cause reduced 
thread utilization. While there is some divergence in the depth 
of each binary search performed by ICount(Bs, Be, a), it 
has an upper bound of 0(log\B\). Outside of this divergence 
ICount(_Bs, Be, a) is a classic SIMD operation l lKirk and Hwul 
l2010h . Finally, the only data structure required for this algorithm 
is a sorted array, and thanks to years of research in this area, 
current GPU sorting algorithms can sort billions of integ ers within 
seconds jMerrill and GrimshawLl201 lhlSatish et a/.Ll2009n . 



3 RESULTS 

3.1 Comparing BITS to extant sequential approaches 

We implemented a sequential version of the BITS algorithm 
("BITS-SEQ") as a stand-alone C++ utility. Here we assess 
the performance of this implementation relative to B EDTOOLS 
inte rsect and UCSC Genome Browser's ("UCSC") l lKent et al[ 
120021) bedlntersect utilities (see Supplemental Materials for 
details). We compare the performance of each tool for counting the 
total number of observed intersections between sets of intervals of 
varying sizes (Figure 2). The comparisons presented are based on 
sequence alignmen ts for the CEU individual NA12878 by the 100 
Genomes Project l lThe 1000 Genomes Project Consortiuml l2010h . 
as well as RefSeq exons. Owing to the different data structures used 
by each algorithm, the relative performance of each approach may 
depend on the genomic distribution of intervals within the sets. As 
discussed previously, tree-based solutions that place intervals into 



hierarchical bins may perform poorly when intervals are unevenly 
distributed among the bins. We tested the impact of differing interval 
distributions on algorithm performance by randomly sampling 1 
and 10 million alignment intervals from both whole-genome and 
exome-capture datasets for NA12878 (see Supplemental Materials). 
Each algorithm was evaluated considering three different interval 
intersection scenarios. First, we tested intervals from different 
distributions by comparing the intersection between exome-capture 
alignments and whole-genome alignments. Since each set has a 
large number of intervals and a different genomic distribution, 
we expect a small (relative to the total set size) number of 
intersections. We also tested a uniform distribution by counting 
intersections between Refseq exons and whole-genome sequencing 
alignments. Here each interval set is, for the most part, evenly 
distributed throughout the genome; thus, we expect each exon to 
intersect roughly the same number of sequencing intervals, and a 
large number of sequencing intervals will not intersect an exon. 
Lastly, we assessed a biased intersection distribution between 
exons and exome-capture alignments. By design, exome sequencing 
experiments intentionally focus collected DNA sequences to the 
coding exons. Thus, the vast majority of sequence intervals will 
align in exonic regions. In contrast to the previous scenario, nearly 
every exon interval will have a large number of sequence interval 
intersections, and nearly all sequencing intervals will intersect an 
exon. 



3.1.1 BITS excels at counting intersections. In all three interval 
distribution scenarios, the sequential version of BITS had superior 
runtime performance for counting intersections. BITS was between 
11.2 and 27.9 times faster than BEDTools and between 1.2 and 
5.2 times faster than UCSC (Figure |2). This behavior is expected; 
whereas the BEDTools and UCSC tree-based algorithms must 
enumerate intersections to derive the count, BITS infers the 
intersection count by exclusion without enumeration. 



3.1.2 BITS excels at large intersections and biased distributions. 
The relative performance gains of the BITS approach are enhanced 
for very large datasets (Figure 2B). Since tree-based methods have 
a fixed number of bins, and searches require a linear scan of each 
associated bin, the number of intervals searched grows linearly with 
respect to the input size. In the worst-case where all intervals are in a 
single bin, a search would scan the entire input set. In contrast, BITS 
employs binary searches so the number of operations is proportional 
to log of the input size, regardless of the input distribution. 

Similarly, exome-capture experiments yield biased distributions 
of intervals among the UCSC bins. Consequently, most bins in 
tree-based methods will contain no intervals, while a small fraction 
contain many intervals. When the query intervals have the same 
bias, the overhead of the UCSC algorithm is more onerous, as a 
small number of bins are queried and each queried bin contains 
many intersecting intervals that must be enumerated in order to 
count overlaps. As the BITS algorithm is agnostic to the interval 
distributions, it will outperform the UCSC algorithm (Figure 2A, 
2B) for common genomic analyses such as ChlP-seq and RNA-seq, 
especially given the massive size of these datasets. 
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Fig. 2. Run times for counting intersections with BITS, BEDTooIs, and UCSC "Kent source". A. Run times for databases of 1 million alignment intervals 
from each interval distribution. B. Run times for databases of 10 million alignment intervals from each interval distribution. Bars reflect the mean run time 
from five independent experiments and error bars describe the standard deviation. Gray bars reflect the run time consumed by data structure construction, while 
white bars are the time spent counting intersections. Above each BITS execution time we note the speed increase relative to BEDTooIs and "Kent source", 
respectively. "Exons" represents 400,351 RefSeq exons (autosomal and X, Y) for the human genome (Build 37). BED = BEDTooIs; UCSC = "Kent source". 



3.2 Applications for Monte Carlo Simulations 

Identifying statistically significant relationships between sets of 
genome intervals is fundamental to genomic research. However, 
owing to our complex evolutionary history, different classes of 
genomic features have distinct genomic distributions, and as such, 
testing for significance can be challenging. One widely-used, yet 
computationally intensive alternative solution is the use of Monte 
Carlo simulations that compare observed interval relationships 
to an expectation based on randomization. All aspects of the 
BITS algorithm are particularly well suited for Monte Carlo (MC) 
simulations measuring relationships between interval sets. As 
described, all intersection algorithms begin detecting intersections 
between two interval sets by setting up their underlying data 
structures (e.g., trees or arrays). The BITS setup process involves 
mapping each interval from the two-dimensional chromosomal 
interval space (i.e., chromosome and start/end coordinates) to a 
one dimensional integer interval space (i.e., start/end coordinates 
ranging from 1 to the total genome size). Once the intervals are 
mapped, arrays are sorted by either start or end coordinates. In 
contrast, the UCSC setup places each interval into a hash table. As 
shown in Figure(2] data structure setup is a significant portion of the 
total runtime for all approaches. 

However, in the case of many MC simulation rounds, where a 
uniformly distributed random interval set is generated and placed 
into the associated data structure, the setup step is faster in BITS, 
whereas the setup time remains constant in each simulation round 
for UCSC. For BITS, the mapping step is skipped in all but the the 
first round and in each simulation round only an array of random 
starts must be generated. The result is a 6x speedup for MC rounds 
over the cost of the initial intersection setup. For UCSC, both the 
chromosome and the interval start position must be generated and 
then placed into the hash table with no change in execution time. 



This speedup in BITS is extended on parallel platforms, where 
the independence of each intersection is combin ed with efficient 



parall el random number generation algorithms dTzen g and Wei, 
2008 ) and p a rallel sorting algorithms dMerrill and GrimshawlHoTlt 
Satish et all l2009h . Monte Carlo simulations have obvious task 
parallelism since each round is independent. BITS running on 
CUDA ("BITS-CUDA") goes a step further and exposes fine-grain 
parallelism in both the setup step, with parallel random number 
generation and parallel sorting, and the intersection step where 
hundreds of intersections execute in parallel. The improvement is 
modest for a single intersection (only parallel sorting can be applied 
to the setup step) where BITS-CUDA is 4x faster than sequential 
BITS and 40x faster than sequential UCSC. However, as the 
number of MC rounds grows performance improves dramatically. 
At 10,000 MC rounds and le7 intervals, BITS-CUDA is 267x faster 
than sequential BITS and 3,414x faster than sequential UCSC. An 
improvement of this scale allows MC analyses for thousands of 
experiments (e.g., 25,281 pairwise comparisons in Section [3~3l >. 

We demonstrate the improved performance of BITS over UCSC 
for Monte Carlo simulations for measuring the significance of the 
overlaps between interval sets in Table [T] As both the number of 
MC rounds and the size of the dataset grows, the speedup of both 
sequential BITS and BITS-CUDA increases over UCSC. For the 
largest comparison (le7 intervals and 10,000 iterations), BITS-SEQ 
is 12x faster that UCSD, and BITS-CUDA is 267x faster than BITS- 
SEQ and 3,414x faster than sequential UCSC. 



3.3 Uncovering novel genomic relationships. 

The efficiency of BITS for Monte Carlo applications on 
GPU architectures provides a scalable platform for identifying 
novel relationships between large scale genomic datasets. To 
illustrate BITS-CUDA's potential for large-scale data mining 
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experiments, we conducted a screen for significant genomic 
co-localization among 159 genome annotation tracks using 
Monte Carlo simulation (see Supplemental Materials). This 
analysis was bas ed upon functional annotat i ons f rom the 
ENCODE project l lENCODE Project Consortiuni l2007h for the 
GM12878, Hl-hESC, and K562 cell lines, including assays 
for 24 transcriptions factors (often with replicates), 8 histone 
modifications, open chromatin, and DNA methylation. We also 
included diverse genome annotations from the UCSC genome 
browser (e.g, repeats, genes, and conserved regions). 

Using BITS-CUDA, we measured the log2 ratio of the observed 
and expected number of intersections for each of the 25,281 (i.e., 
159*159) pairwise dataset relationships using le4 Monte Carlo 
simulations (Figure 3). As expected, this analysis revealed that 
1) the genomic locations for the same functional element are 
largely consistent across replicates and cell types, 2) methylated 
and semi-methylated regions are similar across cell types, and 3) 
most functional assays were anti-correlated with genomic repeats 
(e.g., microsatellites) owing to sequence alignment strategies that 
exclude repetitive genomic regions. Perhaps not surprisingly, this 
unbiased screen also revealed intriguing patterns. First, the strong 
enrichment among all transcription factors (TF) assays suggests that 
a subset of TF binding sites are shared among all factors. This 
observation is consiste nt with previous descriptions of "hot regions" 
l lGerstein et ali l2o"l(if) . In addition, there is a significant, specific, 
and unexplained enrichment among the Six5 transcription factor and 
segmental duplications. 

Pursuing the biology of these relationships is beyond the scope 
of the current manuscript; however, we emphasize that the ability 
to efficiently conduct such large-scale screens facilitates novel 
insights into genome biology. This analysis presented a tremendous 
computational burden made feasible by the facility with which the 
BITS algorithm could be applied to GPU architectures. Indeed, 
each iteration of our Monte Carlo simulation tested for intersections 
among 4 billion intervals among the 25 thousand datasets, yielding 
over 44 trillion comparisons for the entire simulation. Whereas 
this simulation took just over 6 days (9,069 minutes) on a single 
computer with one GPU card, we estimate that it would take at 
least 112 traditional processors to conduct the same analysis using 
traditional approaches such as the UCSC tools or BEDTools. 



Table 1. Runtime (seconds) comparison for Monte Carlo simulations. 
Timings in italics were extrapolated owing to long run times. 



Number of MC iterations 



Size 


Tool 


1 


100 


1000 


100000 


le5 


BITS-CUDA 


0.73 


1 


4 


28 




BITS-SEQ 


0.41 


7 


68 


680 




UCSC 


0.17 


14 


138 


1,381 


le6 


BITS-CUDA 


2 


3 


1 


103 




BITS-SEQ 


5 


120 


1,200 


12,000 




UCSC 


6 


878 


8,780 


87,800 


lc7 


BITS-CUDA 


14 


22 


97 


835 




BITS-SEQ 


66 


2,235 


22,350 


223,500 




UCSC 


568 


28,508 


285,080 


2,850,800 



4 CONCLUSION 

We have developed a novel algorithm for interval intersection that is 
uniquely suited to scalable computing architectures such as GPUs. 
Our algorithm takes a new approach to counting intersections: 
unlike existing methods that must enumerate intersection in order 
to derive a count, BITS uses two binary searches to directly infer 
the count by excluding intervals that cannot intersect one another. 

We have demonstrated that a sequential implementation of BITS 
outperforms existing tools and illustrate that, because it is based on 
binary searches (which have predictable complexity), BITS is task 
efficient and is thus highly parallelizable. As such, we show that a 
GPU implementation of BITS (BITS-CUDA) is a superior solution 
for Monte Carlo analyses of statistical relationships between sets 
of genome intervals, since observed intersections among many sets 
must be compared to thousands of randomized simulations. 

Given the efficiency with which the BITS algorithm counts 
intersections, it is also perfectly suited to many fundamental 
genomic analyses including RNA-seq transcript quantification, 
ChlP-seq peak detection, and searches for copy-number and 
structural variation. Moreover, the functional and regulatory 
data produced by projects suc h as ENCODE hav e driven the 
development of new approaches jFavorov et a/.Ll2012l) to measuring 
relationships among genomic features in order to reveal yet 
undetected insights into genome biology. We recognize the 
importance of scalable approaches to detecting such relationships 
and anticipate that our new algorithm will foster new genome 
mining tools for the genomics community. 
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Fig. 3. BITS-CUDA measurements of spatial correlations among 159 genome features from the ENCODE project and from the UCSC Genome Browser. For 
each comparison, we show an enrichment score reflecting the log2 ratio of the observed count of intersections over the median count of intersections from 
10000 MC simulations. Each set of three labels on the X and Y axes correspond to three consecutive rows or columns, respectively. Assays from the GM12878 
cell line are in green, Hl-hESC in blue, and K562 in red. Annotation tracks from the UCSC Genome Browser are in black. 
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